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Abstract — In  this  letter,  we  conduct  a  comparative  study  and 
investigate  the  relationship  between  two  well-known  techniques 
in  hyperspectral  image  detection  and  classification:  orthogonal 
subspace  projection  (OSP)  and  constrained  energy  minimization 
(CEM).  It  is  shown  that  they  are  closely  related  and  essentially 
equivalent  provided  that  the  noise  is  white  with  large  SNR.  Based 
on  this  relationship,  the  performance  of  OSP  can  be  improved  via 
data- whitening  and  noise- whitening  processes. 

Index  Terms — Classification,  constrained  energy  minimization 
(CEM),  detection,  hyperspectral  imagery,  orthogonal  subspace 
projection  (OSP). 


I.  Introduction 

LINEAR  UNMIXING  has  been  widely  used  for  hyperspec¬ 
tral  image  detection  and  classification  [1]— [9].  It  models  a 
hyperspectral  image  pixel  to  be  a  linear  mixture  of  a  set  of  finite 
image  endmembers  that  are  assumed  in  the  image  data.  Then, 
the  detection  and  classification  is  performed  by  unmixing  the 
pixel  and  finding  the  respective  abundance  fractions  of  these 
endmembers  present  in  the  pixel.  Several  approaches  have  been 
studied  in  the  past,  such  as  singular  value  decomposition  [2], 
subspace  projection  [9],  maximum  likelihood  (ML)  [10],  etc. 
The  relationship  between  ML  and  subspace  projection  was  in¬ 
vestigated  in  [10]— [12]  where  ML-based  linear  unmixing  was 
shown  to  be  equivalent  to  the  orthogonal  subspace  projection 
(OSP)-based  linear  unmixing,  provided  that  the  noise  in  the 
linear  mixing  model  is  white  Gaussian  noise.  Unfortunately, 
such  linear  unmixing  methods  require  the  complete  knowledge 
of  the  image  endmembers.  On  many  practical  occasions,  ob¬ 
taining  this  prior  knowledge  may  not  be  realistic.  In  order  to 
resolve  this  issue,  a  method,  referred  to  as  Constrained  Energy 
Minimization  (CEM)  was  developed  in  [13]  where  the  only  re¬ 
quired  knowledge  is  the  desired  image  endmember  rather  than 
the  entire  set  of  image  endmembers.  The  relationship  between 
CEM  and  linear  unmixing  was  recently  studied  in  [14].  They 
also  are  investigated  as  matched  filter  detector  in  [15].  Both  of 
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them  have  been  successfully  applied  to  hyperspectral  image  de¬ 
tection  and  classification  because  of  their  effectiveness  and  sim¬ 
plicity.  Despite  the  fact  that  these  two  approaches  require  dif¬ 
ferent  levels  of  knowledge,  it  is  interesting  to  find  that  they  are 
indeed  closely  related,  which  is  to  be  explored  in  this  letter. 

The  OSP  is  based  on  the  linear  mixture  model,  which  says  a 
hyperspectral  pixel  vector  r  of  size  K  x  1  with  K  spectral  bands 
can  be  represented  as 

r  =  So!  +  n  (1) 

where  S  =  [si,  s2,  . . . ,  sp]  is  a  K  x  p  signature  matrix 
with  p  endmembers,  and  s7;  is  the  it h  endmember  signature; 
a  —  \ot\,  q?2,  •  •  •  i  aP]T  is  a  p  x  1  abundance  fraction  vector 
where  the  zth  element  <%•  represents  the  abundance  fraction  of  s,- 
present  in  that  pixel;  n  is  a  if  x  1  vector  that  can  be  interpreted 
as  noise  term  or  model  error.  For  the  OSP,  the  signature  matrix 
S  in  (1)  is  further  divided  into  two  parts,  desired  signature 
of  interest  d  and  undesired  signature  matrix  U.  Without  loss 
of  generality,  we  assume  d  is  the  first  endmember  signature 
si,  and  U  is  formed  by  the  rest  of  signatures  [s2  •  •  •  sp ],  i.e., 
S  =  [dU].  Then,  (1)  can  be  rewritten  as 

r  =  d  ad  +  Uttu  +  n  (2) 

where  ad  is  the  abundance  fraction  of  the  desired  signature  d, 
and  cku  is  a  (p  —  1)  x  1  abundance  fraction  vector  of  the  unde¬ 
sired  signatures  in  U  and  a  =  (Gd«u)-  Under  the  white-noise 
assumption,  the  OSP  classifier  projector  Posp  was  derived  as 
[9] 

Posp  =  Pfjd  (3) 

where  P^  =  I  —  U  (UTU)  1  UT  is  the  orthogonal  comple¬ 
ment  projector  that  maps  data  onto  a  subspace  orthogonal  to  the 
undesired  signatures  in  U.  Here,  I  denotes  a  K  x  K  identity  ma¬ 
trix.  According  to  [10]  and  [11],  when  the  OSP  is  implemented 
as  an  abundance  estimator  a  constant  term,  (dTPud)-1  should 
be  included  to  account  for  estimation  accuracy.  Then  (3)  be¬ 
comes 

POSp  =  (dTPid)_1Pid.  (4) 

In  some  situations  we  may  be  only  interested  in  a  certain  ob¬ 
ject  present  in  an  unknown  image  scene  and  only  its  spectral 
signature  is  available.  CEM  was  developed  for  such  case.  It  de¬ 
signs  a  finite-impulse  response  filter  in  such  a  manner  that  the 
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filter  output  energy  is  minimized  subject  to  a  constraint  imposed 
by  desired  signature  of  interest  d.  It  does  not  assume  the  linear 
mixture  model  or  any  noise  characteristics.  Let  the  filter  be  spec¬ 
ified  by  the  coefficients  w  =  [wi,  w2,  . . . ,  wk]T ■  Then,  the 
filter  output  for  the  input  r,  is  expressed  by  yi  =  wTry  .  The 
average  output  energy  E  is  given  by 


where  Rr  =  (1  /q)  Yli=i  rrI  —  (rr>T)  is  the  data  sample  cor¬ 
relation  matrix.  Here,  (•)  denotes  sample  average  over  all  pixels 
and  q  is  the  total  number  of  pixels  in  the  image.  An  optimal  filter 
w  should  be  the  one  that  minimizes  E  in  (5)  subject  to  the  con¬ 
straint  wTd  =  1.  The  solution  of  this  constrained  problem  is 
[13] 

WCEM  =  (drRr-1d)“1  R^ld.  (6) 


where  D  is  another  positive-definite  N  x  N  matrix,  and  C  is 
an  M  x  N  matrix.  A-1  can  be  calculated  by 

A-1  =  B  -  BC  (D  +  CtBC) -1  CtB.  (10) 

Comparing  (8)  with  (9),  A,  B,  C,  and  D  correspond  to  Rr, 
R^h1,  d,  and  o^2,  respectively.  Therefore 

IC1  =Rm'  -  Rm'd  (aj2  +  dTRm'df  dTRm' 

=  Rm1  —  R-mdal  (1  +  a^R^d) _1  dTB^  (11) 
and  wG  is  defined  as 
wG  =  R,T 1  d 

=  [Rmd  -  Rmd«d  (!  +  «ddTRmd)_1  dTRmd]  • 

(12) 

The  term  oj  (l  +  cxJcFR^d)  1  dTR^1d  in  (12)  is  a  scalar, 
so  it  can  be  moved  to  the  front  of  the  term  R^d  next  to  it.  Then 
(12)  becomes 


Using  (6),  a  CEM-based  filter  can  be  designed  to  detect  the  de¬ 
sired  target  d  while  minimizing  the  filter  output  energy  caused 
by  unknown  signal  sources. 

The  CEM  generally  outperforms  the  OSP  in  terms  of  elimi¬ 
nating  unidentified  signal  source  and  suppressing  noise,  but  it 
has  a  poor  generalization  property  since  it  is  very  sensitive  to 
the  knowledge  of  the  desired  signature  d  used  in  (6).  This  is  be¬ 
cause  a  pixel  with  slightly  different  signature  from  the  desired 
signature  d  may  be  considered  as  undesired  or  unknown,  there¬ 
fore,  will  be  eliminated.  One  way  to  mitigate  this  problem  is  to 
find  a  good  representative  of  d  based  on  a  large  number  of  sam¬ 
ples.  However,  in  some  cases,  a  large  number  of  samples  may 
not  be  available.  Another  way  is  to  use  only  part  of  eigenvalues 
and  eigenvectors  of  Rr  to  calculate  R”1  as  did  in  [16].  But  a 
problem  associated  with  it  is  how  to  determine  the  number  of 
eigenvalues  and  eigenvectors  to  be  used.  In  this  letter,  we  further 
investigate  the  relationship  between  the  OSP  and  CEM,  which 
may  help  us  to  better  understand  the  strengths  and  weaknesses 
of  both  techniques  and  under  what  conditions  they  can  perform 
well. 


II.  Relationship  Between  OSP  and  CEM 

Assume  the  sample  mean  is  removed.  Let  m  =  U«u  +  n- 
Then  (2)  becomes 


r  =  daa  +  m.  (7) 

If  («do£)  =  0,  Rr  can  be  represented  as  [17] 

Rr  =  «dddT  +  Rm  (8) 

where  Rm  =  (mmT)  is  the  sample  correlation  matrix  of  m. 

The  following  matrix  inversion  lemma  [18]  is  used  to  calcu¬ 
late  R”1  in  (8). 

Lemma:  Let  A  and  B  be  two  positive-definite  M  x  M  ma¬ 
trices  related  by 

A  =  B1  +  CD1CT  (9) 


w0  =Rr  M 


R^d  -  <4  ( 1  +  c4dTR^d)  1  dTR-1dR^11d 


l-a^l  +  ^R-'drVR^d  R^d 


=  (l  +  c^R^dUR^d. 


(13) 


Substituting  the  result  in  (13)  into  (12)  and  (6)  and  noticing 
Od  ( 1  +  OddTRmld)  is  a  scalar 

wCEM=  (drR71d)_1R71d 

=  [dT  (l  +  «ddTRmd)_1  R-md 

•  (l+Ad^-'d^’R-'d 

=  (drHm  d;  '  H,n  d.  (14) 


If  noise  is  white,  Rm  in  (8)  becomes 

Rm  =  UR^u  UT  +  Rn  =  UR,„  UT  +  <t2I  (15) 

where  Rau  =  Using  the  matrix  lemma  again,  and 

assuming  that  cr2  is  small  enough  compared  to  the  undesired 
signatures  U,  R^1  can  be  computed  as 

R-i  =  a~2I  -  a~2U  (R^  +  a~2UTU)  **  UTa~2 
=  cr- 2  [i  -  U  (<t2R-;  +  UTU) -1  Ur] 

«<r-2P£.  (16) 


Plugging  the  result  of  (16)  into  (14)  gives 

WCEM  =  (d'  R-'df  ^-‘d  =  (d'  R-’df ]  R-'d 

«■  (drPid)_1Pid  =  Posp.  (17) 

Equation  (17)  implies  that  the  OSP  and  the  CEM  are  essen¬ 
tially  the  same  filter  as  long  as  the  noise  is  white  and  as  long 
as  its  variance  is  negligible  compared  to  the  signals,  i.e.,  SNR 
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Fig.  1 .  HYDICE  image  scene  used  in  the  experiment,  (a)  Image  scene,  (b)  Panel  arrangement. 


is  sufficiently  high,  which  is  generally  true  in  hyperspectral  im¬ 
agery. 


III.  Performance  Improvement  for  OSP 

The  relationship  between  OSP  and  CEM  derived  in  Section  II 
can  be  used  to  improve  the  performance  of  OSP.  In  Section  II 
(odau)  =  0  is  assumed,  and  white  noise  with  large  SNR  is  re¬ 
quired.  In  this  section,  we  will  show  that  if  these  two  conditions 
are  satisfied,  the  performance  of  OSP  can  be  improved. 

A.  Data  Whitening 

When  applying  the  OSP  to  classify  each  class,  its  corre¬ 
sponding  signature  in  S  =  [dU]  is  treated  as  d  while  making 
the  rest  p—  1  signatures  as  U.  So  u)  =  0  for  each  Pair  °f 
d  :  U  means  R«  =  ( aa r)  is  a  diagonal  matrix.  According  to 
[10]  and  [1 1]  the  abundance  estimation  using  the  OSP  operator 
in  (4)  can  be  expressed  as 

a  =  (STS)_1  Sr.  (18) 

So  Re*  can  be  approximated  as 

R5  (a«r)  =  ((SrS)_1  SrrTSr  (SrS)_1) 

=  (STS) _1  SR,.St  (STS) .  (19) 

If  Rr  is  whitened  to  be  the  identity  matrix  and  the  signatures  in 
S  are  orthogonal  to  each  other,  i.e.,  (STS)  =  I,  then  Rtt  =  I 
is  a  diagonal  matrix. 

The  whitening  of  Rr  can  be  achieved  by  generating  a  data- 
whitening  operator  Pw  as 

pT  =  V2-1/2Vf .  (20) 

Then,  Rr  can  be  whitened  by  applying  P^r  to  all  the  pixels. 
Here,  V i  and  V 2  are  eigenvector  and  eigenvalue  matrices  of  Rr 
respectively.  They  can  be  determined  by  eigendecomposition 
Rr  =  ViV2Vf,  where  Vi  =  [vi,  . . . ,  vfe,  . . . ,  vK]  with  vfc 
being  the  kth  eigenvector  and  V2  is  a  diagonal  matrix  with  the 
kth  diagonal  item  A&  being  the  kth  eigenvalue  corresponding  to 
Vfe,  i.e.,  V2  =  diag  {Ai,  . . . ,  Afe,  . . . ,  XK}- 

As  for  the  orthogonality  among  the  signatures  in  S,  the 
Gram-Schmidt  orthogonaliztion  process  can  be  used.  But  it  be¬ 
comes  null  when  signature  subspace  of  U  in  is  constructed 
using  U  (UTU)  UT.  So  in  practice  the  orthogonalization  is 
unnecessary. 


B.  Noise  Whitening 

The  assumption  about  the  white  noise  may  not  be  true  in  prac¬ 
tice.  And  the  noise  in  the  whitened  data  generally  is  not  white. 
A  noise-whitening  process  is  needed  via  noise  estimation.  The 
noise  variance  can  be  estimated  by  exploiting  the  interband  cor¬ 
relation  such  as  residual-based  estimation  [19]  and  the  intra/in¬ 
terband  correlation  such  as  linear  regression  model-based  pre¬ 
diction  [20].  A  noise  covariance  matrix  can  be  estimated  using 
neighborhood  difference  method  [21]  and  Laplacian  operator 
[22].  We  find  that  an  accurate  estimate  of  band-to-band  noise 
correlation  is  generally  difficult  to  achieve.  So  here  we  only  es¬ 
timate  noise  variance  and  construct  a  diagonal  noise  covariance 
matrix  using  the  method  in  [20]  because  of  its  relative  efficiency 
and  simplicity.  After  the  estimated  noise  covariance  matrix  is 
constructed  as  £n  =  diag  {cq?  ,  . . .  o\k,  . . . ,  vlK 
be  whitened  by  applying  to  all  the  pixels. 

IV.  Experiment 

The  data  used  in  the  experiments  is  the  Hyperspectral  Dig¬ 
ital  Imagery  Collection  Experiment  (HYDICE)  data.  The  image 
scene  of  size  64  x  64  shown  in  Fig.  1(a)  was  collected  in  Mary¬ 
land  in  1995  from  the  flight  altitude  of  10000  ft  with  approx¬ 
imately  1.5  m  GSD.  Removing  bands  with  low  SNR  results  in 
169  data  dimensions.  There  are  15  panels  present  in  the  image 
scene,  which  are  arranged  in  a  5  x  3  matrix.  Each  element  in 
this  matrix  is  denoted  by  pij  with  row  indexed  by  i  and  column 
indexed  by  j.  The  three  panels  in  the  same  row  were  made  from 
the  same  material  and  are  of  size  3  x  3,  2  x  2,  and  1  x  1  re¬ 
spectively,  and  they  are  considered  as  a  single  class.  The  ground 
truth  map  is  provided  in  Fig.  1(b)  and  shows  the  precise  spatial 
locations  of  panel  pixels  where  the  black  pixels  are  referred  to 
as  panel  center  pixels,  and  white  pixels  are  considered  as  panel 
pixels  mixed  with  background  pixels.  The  panels  in  each  row 
are  in  a  same  class  and  the  signatures  of  these  five  panel  classes 
are  very  similar.  In  addition  to  the  panel  signatures,  two  back¬ 
ground  signatures  were  also  generated  from  the  grass  field  and 
tree  line  to  the  left  of  the  panels. 

The  classification  results  using  OSP  and  CEM  are  shown  in 
Figs.  2  and  3.  CEM  provided  a  better  result  because  it  correctly 
detected  the  five  panel  classes  and  successfully  eliminated 
background  noise.  The  result  of  OSP  contained  larger  number 
of  background  pixels.  It  was  improved  after  noise  whitening, 
as  shown  in  Fig.  4,  where  the  improvement  was  obvious 
when  classifying  the  panels  in  rows  1,  4,  and  5  in  terms  of 
better  background  signature  elimination.  This  is  because  the 
OSP  makes  white-noise  assumption,  and  the  noise-whitening 


} ,  noise  can 
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Fig.  2.  Classification  result  of  CEM.  (Left  to  right)  PI,  P2,  P3,  P4,  P5. 


Fig.  3.  Classification  result  of  OSP.  (Left  to  right)  PI,  P2,  P3,  P4,  P5 


Fig.  4.  Classification  result  of  OSP  after  noise  whitening  (from  left  to  right:  PI,  P2,  P3,  P4,  P5). 


Fig.  5.  The  classification  result  of  OSP  after  data  whitening.  (Left  to  right)  PI,  P2,  P3,  P4,  P5. 


Fig.  6.  Classification  result  of  OSP  after  data  whitening  and  noise  whitening.  (Left  to  right)  PI,  P2,  P3,  P4,  P5. 


process  should  be  able  to  improve  its  performance.  Fig.  5 
shows  the  result  of  OSP  after  data  whitening.  The  improvement 
was  significant  and  the  result  was  comparable  to  that  of  CEM 
in  Fig.  2.  Fig.  6  presents  the  result  of  OSP  after  data  whitening 
followed  by  a  noise-whitening  process.  The  difference  between 
Figs.  5  and  6  is  inappreciable.  This  may  be  because  the  noise 
correlation  was  greatly  reduced  by  the  data-whitening  process. 
Using  the  noise  estimate  method  in  Section  III-B,  we  found 
that  the  noise  variance  in  161  out  of  169  bands  were  close  to 
unity.  So  in  this  experiment,  the  noise-whitening  process  based 
on  the  noise  estimation  technique  in  Section  III-B  could  not 
further  improve  the  performance  after  the  data  was  whitened. 


The  images  shown  in  Figs.  2-6  are  grayscale  images  with 
the  pixel  gray  level  corresponding  to  the  abundance  fractions 
of  a  specific  endmember.  In  order  to  make  a  quantitative 
comparison,  we  converted  them  to  binary  images  by  using  the 
50%  of  the  maximal  abundance  fraction  as  cut-off  threshold. 
Table  I  lists  the  number  of  correctly  detected  pixels  Np 
and  false-alarm  pixels  Np  using  the  CEM  in  Fig.  2,  the 
original  OSP  in  Fig.  3,  the  improved  OSP  in  Fig.  4  with  the 
noise- whitening  process  only  (OSP-M1),  the  improved  OSP  in 
Fig.  5  with  the  data-whitening  process  only  (OSP-M2)  and  the 
improved  OSP  with  the  data-whitening  process  followed  by  the 
noise- whitening  process  (OSP-M3).  The  CEM  could  detect  the 
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TABLE  I 

Tally  of  the  Number  of  Pixels  Detected  and  False-Alarmed  Using  Different  Methods 


CEM 

OSP 

OSP-M1 

OSP-M2 

OSP-M3 

N 

Nd 

Nf 

Nd 

Nf 

Nd 

Nf 

Nd 

Nf 

Nd 

Nf 

PI 

3 

2 

0 

1 

232 

2 

0 

2 

0 

2 

0 

P2 

4 

3 

0 

4 

2693 

4 

315 

3 

0 

3 

0 

P3 

4 

3 

0 

4 

57 

4 

5 

3 

0 

3 

0 

P4 

4 

3 

0 

4 

2274 

4 

28 

3 

0 

3 

0 

P5 

4 

3 

0 

4 

1191 

3 

4 

3 

0 

3 

0 

Total 

19 

14 

17 

17 

14 

14 

14  out  of  19  panel  pixels  without  false  alarm.  The  result  of  OSP 
contained  large  false-alarm  rate,  which  means  the  panels  could 
not  be  classified  correctly  as  shown  in  Fig.  3.  The  OSP-M1 
greatly  reduced  the  false-alarm  rates  while  detecting  the  17 
out  of  19  panel  pixels.  The  OSP-M2  significantly  improved 
the  performance  of  the  OSP  and  provided  the  same  results  as 
CEM.  No  further  improvement  was  provided  by  the  OSP-M3 
in  this  experiment. 

This  experiment  demonstrates  that  either  a  noise-whitening 
process  and  a  data- whitening  process  can  bring  about  improve¬ 
ment  to  the  performance  of  OSP.  But  the  improvement  from  the 
noise-whitening  process  is  limited  by  the  accuracy  of  the  noise 
estimate. 

V.  Conclusion 

The  relationship  between  OSP  and  CEM  is  investigated.  It 
has  been  shown  that  when  the  noise  is  white  with  large  SNR, 
the  OSP  and  the  CEM  perform  very  closely.  In  this  case,  they 
can  be  considered  essentially  the  same  filter.  Based  on  this  re¬ 
lationship,  the  performance  of  OSP  can  be  improved  through 
data-whitening  and  noise-whitening  processes.  Future  research 
will  focus  on  a  more  effective  technique  to  estimate  the  noise 
covariance  matrix  to  be  used  in  the  noise-whitening  process. 
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